# Continuous pre-training optimization
Llama 3.3 Swallow 70B Instruct V0.4
Llama 3.3 Swallow is a large language model (70B) based on continuous pre-training of the Meta Llama 3.3 model, enhancing Japanese capabilities while retaining original English proficiency.
Large Language Model
Transformers Supports Multiple Languages

L
tokyotech-llm
874
3
Gemma 2 Llama Swallow 9b It V0.1
The Gemma-2-Llama-Swallow series of models are multilingual large models constructed through continuous pre-training based on Gemma-2, with a particular enhancement in Japanese ability.
Large Language Model
Transformers Supports Multiple Languages

G
tokyotech-llm
2,491
3
Llama 3.1 8B UltraLong 4M Instruct
A large language model specifically designed for processing ultra-long text sequences (supporting up to 1 million, 2 million, and 4 million tokens), maintaining excellent performance in standard benchmarks
Large Language Model
Transformers English

L
nvidia
264
27
Surgicberta
SurgicBERTa is a language model developed based on the RoBERTa-base architecture, specifically optimized for surgical textbooks and academic papers.
Large Language Model
Transformers

S
marcobombieri
26
3
Featured Recommended AI Models